CoSIA: Cross Species Investigation and Analysis is an R package that provides researchers with the tools to measure and visualize gene-expression metrics in order to compare across model organisms and their tissues. Specifically, CoSIA uses curated non-diseased wild-type RNA-sequencing expression data, from Bgee, to visualize a gene’s expression across tissues and model organisms. CoSIA also streamlines conversions between gene identifiers among the same species and different species.

Figure 1. CoSIA_Workflow

CoSIA is split into 3 modules that provide various resources in order for researchers to conduct cross species analysis using gene expression metrics.

1 comment sample summary

Module 1 uses getConversion to convert inputs between different gene identifiers in the same species as well as orthologs in different species. The other modules access tissue and/or species specific gene expression. Module 2 uses getGEx to obtain raw [VST transformed] gene expression values of a single gene across multiple tissues in single model organism or across multiple model organisms in a single tissue. Plotting methods, plotTissueGEx & plotSpeciesGEx, are used to visualize gene expression values in this module. The third module uses the method getGExMetrics to calculate median-based Coefficient of Variation (variability) and Shannon Entropy (diversity & specificity). There are two accompanying plotting methods, plotCVGEx & plotDSGEx that are used to visualize the variation and diversity of gene expression across tissues and model organisms.


1.1 Installation

In R:


#if (!requireNamespace("BiocManager", quietly=TRUE))
    #install.packages("BiocManager")
#BiocManager::install("CoSIA")

#library(devtools)
#install_github("lasseignelab/CoSIA", ref= "main")

1.2 Generating a CoSIAn object

1.2.1 Load CoSIA


library(CoSIA)

load("~/Desktop/EH_Data.RData")
#load("../../Learning/useCoSIA/data/EH_Data.RData")
load("../inst/extdata/proccessed/monogenic_kidney_genes.rda")

1.2.2 Arguments and options table

Slot Name Possible Value Options Default
gene_set “character”, c(characters), data.frame$column N/A
i_species; o_species; map_species h_sapiens, m_musculus, r_noregicus, d_rerio, d_melangoaster, c_elegans N/A
mapping_tool annotationDBI, biomaRt annotationDBI
input_id; output_ids Ensembl_id, Entrez_id, Symbol N/A
ortholog_database NCBIOrtho, HomoloGene HomoloGene
map_tissue c(“tissue”), “tissue”; see getTissues N/A
metric_type CV_Tissue, CV_Species, DS_Gene, DS_Gene_all, DS_Tissue, DS_Tissue_all N/A

1.2.3 Find possible tissues with getTissues

The function getTissues retrieves tissues available for a single species:


CoSIA::getTissues("d_rerio")
#> # A tibble: 25 × 1
#>    Common_Anatomical_Entity_Name
#>    <chr>                        
#>  1 blastula                     
#>  2 bone element                 
#>  3 brain                        
#>  4 camera-type eye              
#>  5 early embryo                 
#>  6 embryo                       
#>  7 gastrula                     
#>  8 granulocyte                  
#>  9 head                         
#> 10 head kidney                  
#> # … with 15 more rows

…or tissues shared across a list of species:


CoSIA::getTissues(c("h_sapiens", "m_musculus", "r_norvegicus"))
#> # A tibble: 17 × 1
#>    Common_Anatomical_Entity_Name
#>    <chr>                        
#>  1 adult mammalian kidney       
#>  2 brain                        
#>  3 cerebellum                   
#>  4 colon                        
#>  5 duodenum                     
#>  6 esophagus                    
#>  7 frontal cortex               
#>  8 heart                        
#>  9 kidney                       
#> 10 liver                        
#> 11 lung                         
#> 12 ovary                        
#> 13 pancreas                     
#> 14 skeletal muscle tissue       
#> 15 spleen                       
#> 16 stomach                      
#> 17 testis

1.2.4 Initiating a CoSIAn object


CoSIAn_Obj <- CoSIA::CoSIAn(gene_set = unique(monogenic_kidney_genes$Gene),
                            i_species = "h_sapiens",
                            o_species = c("h_sapiens", "m_musculus", "r_norvegicus"),
                            input_id = "Symbol",
                            output_ids = "Ensembl_id",
                            map_species = c("h_sapiens", "m_musculus", "r_norvegicus"),
                            map_tissues = c("adult mammalian kidney", "heart"),
                            mapping_tool = "annotationDBI",
                            ortholog_database = "HomoloGene",
                            metric_type = "CV_Species"
                            )

str(CoSIAn_Obj)
#> Formal class 'CoSIAn' [package "CoSIA"] with 13 slots
#>   ..@ gene_set         : chr [1:386] "CYP17A1" "OPLAH" "NOTCH2" "SALL4" ...
#>   ..@ i_species        : chr "h_sapiens"
#>   ..@ input_id         : chr "Symbol"
#>   ..@ o_species        : chr [1:3] "h_sapiens" "m_musculus" "r_norvegicus"
#>   ..@ output_ids       : chr "Ensembl_id"
#>   ..@ mapping_tool     : chr "annotationDBI"
#>   ..@ ortholog_database: chr "HomoloGene"
#>   ..@ converted_id     :'data.frame':    1 obs. of  1 variable:
#>   .. ..$ X0: num 0
#>   ..@ map_tissues      : chr [1:2] "adult mammalian kidney" "heart"
#>   ..@ map_species      : chr [1:3] "h_sapiens" "m_musculus" "r_norvegicus"
#>   ..@ gex              :'data.frame':    1 obs. of  1 variable:
#>   .. ..$ X0: num 0
#>   ..@ metric_type      : chr "CV_Species"
#>   ..@ metric           :'data.frame':    1 obs. of  1 variable:
#>   .. ..$ X0: num 0

1.3 Use Cases with monogenic kidney disease-associated genes

The following use cases provide running examples of CoSIA applications with Natera’s Monogenic Kidney Disease Panel. We will perform id conversion, obtain and visualize gene expression data, and calculate and visualize the variation and diversity of gene expression across three species (human, mouse, & rat) and two tissues (kidney & heart).

1.3.1 Use Case #1: Converting gene symbols to Ensembl IDs

Short statement on ids, note about requirement to convert to ensembl to access gex functionality. Note about useful warning error messages


CoSIAn_Obj_convert <- CoSIA::getConversion(CoSIAn_Obj)

str(CoSIAn_Obj_convert)
#> Formal class 'CoSIAn' [package "CoSIA"] with 13 slots
#>   ..@ gene_set         : chr [1:386] "CYP17A1" "OPLAH" "NOTCH2" "SALL4" ...
#>   ..@ i_species        : chr "h_sapiens"
#>   ..@ input_id         : chr "Symbol"
#>   ..@ o_species        : chr [1:3] "h_sapiens" "m_musculus" "r_norvegicus"
#>   ..@ output_ids       : chr "Ensembl_id"
#>   ..@ mapping_tool     : chr "annotationDBI"
#>   ..@ ortholog_database: chr "HomoloGene"
#>   ..@ converted_id     :'data.frame':    411 obs. of  4 variables:
#>   .. ..$ h_sapiens_symbol       : chr [1:411] "CYP17A1" "OPLAH" "NOTCH2" "SALL4" ...
#>   .. ..$ h_sapiens_ensembl_id   : chr [1:411] "ENSG00000148795" "ENSG00000178814" "ENSG00000134250" "ENSG00000101115" ...
#>   .. ..$ m_musculus_ensembl_id  : chr [1:411] "ENSMUSG00000003555" "ENSMUSG00000022562" "ENSMUSG00000027878" "ENSMUSG00000027547" ...
#>   .. ..$ r_norvegicus_ensembl_id: chr [1:411] "ENSRNOG00000020035" "ENSRNOG00000011781" "ENSRNOG00000018835" "ENSRNOG00000050035" ...
#>   ..@ map_tissues      : chr [1:2] "adult mammalian kidney" "heart"
#>   ..@ map_species      : chr [1:3] "h_sapiens" "m_musculus" "r_norvegicus"
#>   ..@ gex              :'data.frame':    1 obs. of  1 variable:
#>   .. ..$ X0: num 0
#>   ..@ metric_type      : chr "CV_Species"
#>   ..@ metric           :'data.frame':    1 obs. of  1 variable:
#>   .. ..$ X0: num 0

1.3.2 Use Case #2: Obtaining and visualizing curated non-diseased kidney and heart gene expression data for human, mouse, rat from Bgee

note about requirement to convert to ensembl to access gex functionality could also go here. Note about useful warning error messages. limitations of multi- tissue/species? Note about plot, don’t compare species (maybe should do tissue plot ?)


CoSIAn_Obj_gex <- CoSIA::getGEx(CoSIAn_Obj_convert)

str(CoSIAn_Obj_gex)
#> Formal class 'CoSIAn' [package "CoSIA"] with 13 slots
#>   ..@ gene_set         : chr [1:386] "CYP17A1" "OPLAH" "NOTCH2" "SALL4" ...
#>   ..@ i_species        : chr "h_sapiens"
#>   ..@ input_id         : chr "Symbol"
#>   ..@ o_species        : chr [1:3] "h_sapiens" "m_musculus" "r_norvegicus"
#>   ..@ output_ids       : chr "Ensembl_id"
#>   ..@ mapping_tool     : chr "annotationDBI"
#>   ..@ ortholog_database: chr "HomoloGene"
#>   ..@ converted_id     :'data.frame':    411 obs. of  4 variables:
#>   .. ..$ h_sapiens_symbol       : chr [1:411] "CYP17A1" "OPLAH" "NOTCH2" "SALL4" ...
#>   .. ..$ h_sapiens_ensembl_id   : chr [1:411] "ENSG00000148795" "ENSG00000178814" "ENSG00000134250" "ENSG00000101115" ...
#>   .. ..$ m_musculus_ensembl_id  : chr [1:411] "ENSMUSG00000003555" "ENSMUSG00000022562" "ENSMUSG00000027878" "ENSMUSG00000027547" ...
#>   .. ..$ r_norvegicus_ensembl_id: chr [1:411] "ENSRNOG00000020035" "ENSRNOG00000011781" "ENSRNOG00000018835" "ENSRNOG00000050035" ...
#>   ..@ map_tissues      : chr [1:2] "adult mammalian kidney" "heart"
#>   ..@ map_species      : chr [1:3] "h_sapiens" "m_musculus" "r_norvegicus"
#>   ..@ gex              :'data.frame':    2166 obs. of  8 variables:
#>   .. ..$ Anatomical_entity_name: chr [1:2166] "adult mammalian kidney" "adult mammalian kidney" "adult mammalian kidney" "adult mammalian kidney" ...
#>   .. ..$ Ensembl_ID            : chr [1:2166] "ENSG00000285723" "ENSG00000170381" "ENSG00000171365" "ENSG00000171303" ...
#>   .. ..$ Sample_size           : int [1:2166] 23 23 23 23 23 23 23 23 23 23 ...
#>   .. ..$ VST                   : chr [1:2166] "3.54564422116888, 4.92275162664869, 5.75395660803669, 6.68361521764915, 6.57409970660495, 6.21429873820157, 5.6"| __truncated__ "6.56205678528148, 6.54346016245676, 5.37017712209594, 5.47643859281315, 5.52574186004803, 5.14210338316864, 4.4"| __truncated__ "12.0489682651901, 12.0762510841964, 12.2159027774138, 13.018281797489, 12.9735206060674, 13.0711514984742, 12.2"| __truncated__ "8.16211477338736, 8.23873418475968, 9.11809057942399, 7.17900185115043, 7.82219438387582, 8.32183079980092, 6.5"| __truncated__ ...
#>   .. ..$ Experiment_ID         : chr [1:2166] "GSE30611, ERP003613, GSE30352, SRP058036, SRP012682" "GSE30611, ERP003613, GSE30352, SRP058036, SRP012682" "GSE30611, ERP003613, GSE30352, SRP058036, SRP012682" "GSE30611, ERP003613, GSE30352, SRP058036, SRP012682" ...
#>   .. ..$ Anatomical_entity_ID  : chr [1:2166] "UBERON:0000082" "UBERON:0000082" "UBERON:0000082" "UBERON:0000082" ...
#>   .. ..$ Scaled_Median_VST     : num [1:2166] 0.0645 0.0645 0.4075 0.1971 0.4025 ...
#>   .. ..$ Species               : chr [1:2166] "Homo_sapiens" "Homo_sapiens" "Homo_sapiens" "Homo_sapiens" ...
#>   ..@ metric_type      : chr "CV_Species"
#>   ..@ metric           :'data.frame':    1 obs. of  1 variable:
#>   .. ..$ X0: num 0

CoSIAn_Obj_gexplot <- CoSIA::plotSpeciesGEx(CoSIAn_Obj_gex, "adult mammalian kidney", "ENSG00000171766")

CoSIAn_Obj_gexplot

Here there should be words of the interpretation of the use case gene set.


1.3.3 Use Case #3: Gene expression variation across species for kidney tissue by calculating and visualizing Coeffient of Variation

Here there should be words describing the metric and how it is calculated


CoSIAn_Obj_CV <- CoSIA::getGExMetrics(CoSIAn_Obj_gex)

CoSIAn_Obj_CVplot <- CoSIA::plotCVGEx(CoSIAn_Obj_CV)


CoSIAn_Obj_CVplot
#> $rect
#> $rect$w
#> [1] 0.5754659
#> 
#> $rect$h
#> [1] 84.81013
#> 
#> $rect$left
#> [1] 0.6873341
#> 
#> $rect$top
#> [1] 348.4
#> 
#> 
#> $text
#> $text$x
#> [1] 0.7821214 0.7821214 0.7821214
#> 
#> $text$y
#> [1] 327.1975 305.9949 284.7924

Here there should be words of the interpretation of the use case gene set.


1.3.4 Use Case #4: Gene expression diversity and specificity across tissues and species for monogenic kidney-disease associated genes

Here there should be words describing the metric and how it is calculated


CoSIAn_Obj_DS <- CoSIA::CoSIAn(gene_set = unique(monogenic_kidney_genes$Gene),
                            i_species = "h_sapiens",
                            o_species = c("h_sapiens", "m_musculus", "r_norvegicus"),
                            input_id = "Symbol",
                            output_ids = "Ensembl_id",
                            map_species = c("h_sapiens", "m_musculus", "r_norvegicus"),
                            map_tissues = c("adult mammalian kidney", "heart"),
                            mapping_tool = "annotationDBI",
                            ortholog_database = "HomoloGene",
                            metric_type = "DS_Tissue"
                            )

CoSIAn_Obj_DS <- CoSIA::getConversion(CoSIAn_Obj_DS)

CoSIAn_Obj_DS <- CoSIA::getGExMetrics(CoSIAn_Obj_DS)

CoSIAn_Obj_DSplot <- CoSIA::plotDSGEx(CoSIAn_Obj_DS)

CoSIAn_Obj_DSplot

Here there should be words of the interpretation of the use case gene set.

Session info

sessionInfo()
#> R version 4.2.2 (2022-10-31)
#> Platform: x86_64-apple-darwin17.0 (64-bit)
#> Running under: macOS Big Sur ... 10.16
#> 
#> Matrix products: default
#> BLAS:   /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRblas.0.dylib
#> LAPACK: /Library/Frameworks/R.framework/Versions/4.2/Resources/lib/libRlapack.dylib
#> 
#> locale:
#> [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] CoSIA_0.99.0     BiocStyle_2.26.0
#> 
#> loaded via a namespace (and not attached):
#>  [1] Biobase_2.58.0            httr_1.4.4               
#>  [3] sass_0.4.4                tidyr_1.2.1              
#>  [5] org.Rn.eg.db_3.16.0       bit64_4.0.5              
#>  [7] jsonlite_1.8.4            viridisLite_0.4.1        
#>  [9] bslib_0.4.2               assertthat_0.2.1         
#> [11] highr_0.10                BiocManager_1.30.19      
#> [13] stats4_4.2.2              blob_1.2.3               
#> [15] GenomeInfoDbData_1.2.9    yaml_2.3.6               
#> [17] pillar_1.8.1              RSQLite_2.2.20           
#> [19] glue_1.6.2                digest_0.6.31            
#> [21] RColorBrewer_1.1-3        XVector_0.38.0           
#> [23] colorspace_2.1-0          htmltools_0.5.4          
#> [25] pkgconfig_2.0.3           magick_2.7.3             
#> [27] bookdown_0.32             zlibbioc_1.44.0          
#> [29] org.Mm.eg.db_3.16.0       purrr_1.0.1              
#> [31] scales_1.2.1              tibble_3.1.8             
#> [33] KEGGREST_1.38.0           generics_0.1.3           
#> [35] farver_2.1.1              IRanges_2.32.0           
#> [37] ggplot2_3.4.0             ellipsis_0.3.2           
#> [39] cachem_1.0.6              withr_2.5.0              
#> [41] BiocGenerics_0.44.0       lazyeval_0.2.2           
#> [43] cli_3.6.0                 magrittr_2.0.3           
#> [45] crayon_1.5.2              memoise_2.0.1            
#> [47] evaluate_0.20             fansi_1.0.4              
#> [49] homologene_1.4.68.19.3.27 tools_4.2.2              
#> [51] data.table_1.14.6         org.Hs.eg.db_3.16.0      
#> [53] lifecycle_1.0.3           stringr_1.5.0            
#> [55] plotly_4.10.1             S4Vectors_0.36.1         
#> [57] munsell_0.5.0             AnnotationDbi_1.60.0     
#> [59] Biostrings_2.66.0         compiler_4.2.2           
#> [61] jquerylib_0.1.4           GenomeInfoDb_1.34.6      
#> [63] rlang_1.0.6               grid_4.2.2               
#> [65] RCurl_1.98-1.9            rstudioapi_0.14          
#> [67] htmlwidgets_1.6.1         crosstalk_1.2.0          
#> [69] labeling_0.4.2            bitops_1.0-7             
#> [71] rmarkdown_2.20            gtable_0.3.1             
#> [73] annotationTools_1.72.0    DBI_1.1.3                
#> [75] R6_2.5.1                  knitr_1.41               
#> [77] dplyr_1.0.10              fastmap_1.1.0            
#> [79] bit_4.0.5                 utf8_1.2.2               
#> [81] stringi_1.7.12            Rcpp_1.0.10              
#> [83] vctrs_0.5.2               png_0.1-8                
#> [85] tidyselect_1.2.0          xfun_0.36